Modeling the Acoustic Correlates of Dialog Act for Expressive Chinese Tts Synthesis

نویسندگان

  • Hongwu YANG
  • Helen M. MENG
  • Lianhong CAI
چکیده

This paper proposed a novel approach for describing the expressivity of dialog text and modelling their acoustic correlates for expressive text-to-speech (TTS) synthesis. We applied the Dialog Acts (DAs) in describing expressivity. In particular, we set up a Wizard-of-Oz (WoZ) data collection framework to collect the tourism domain corpus and annotated the DAs. A Pitch Target model which is optimized to describe Mandarin F0 contours was introduced to model the pitch contour of Mandarin syllables. Then a Generalized Regression Neural Network (GRNN) based model was developed, that can transform acoustic features of neutral speech (parameters of pitch target model, duration, energy and pauses) to resemble expressive speech, according to the DA of the input text. Perceptual evaluation of the modified speech outputs shows that over 63% of the utterances carry appropriate expressivity. Expressive Mean Opinion Score also demonstrated that modified speech improved the expressivity of the neutral speech.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Modeling the acoustic correlates of expressive elements in text genres for expressive text-to-speech synthesis

This paper proposes a novel approach for describing the expressive elements in text genres and modeling their acoustic correlates for expressive text-to-speech synthesis (TTS). We apply the three-dimensional PAD (pleasure-displeasure, arousal-nonarousal and dominance-submissiveness) model in describing expressivity. In particular, we define a set of principles for annotating the P and A values ...

متن کامل

Speech acts and dialog TTS

The approach outlined in this paper aims to provide better expressivity of unit selection TTS for dialog intended applications while retaining the natural sounding voice quality typical of unit selection synthesis. A small set of speech acts were used to annotate a corpus from one female US English speaker. The corpus was composed of speech read primarily from interactive dialogs of various kin...

متن کامل

Enriching Text-to-Speech Synthesis Using Automatic Dialog Act Tags

We present an approach for enriching dialog based textto-speech (TTS) synthesis systems by explicitly controlling the expressiveness through the use of dialog act tags. The dialog act tags in our framework are automatically obtained by training a maximum entropy classifier on the Switchboard-DAMSL data set, unrelated to the TTS database. We compare the voice quality produced by exploiting autom...

متن کامل

Dialog speech acts and prosody: Considerations for TTS

As natural language dialog systems involving both speech recognition and text-to-speech (TTS) synthesis become more sophisticated, the limitations of general-purpose TTS for human-computer dialogs have become more apparent. Much subtlety and complexity of meaning in natural language dialogs is conveyed by prosody; how something is said is often as important as what words are spoken. At the same...

متن کامل

Application of expressive TTS synthesis in an advanced ECA system

The research project COMPANIONS aims at developing an advanced embodied conversational agent (ECA). This ECA is used in two scenarios and two languages (English and Czech), and it requires a TTS system being able to generate very natural expressive and emotional speech output. This paper describes application issues of two such systems within the ECA, introduces approaches to expressive speech ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008